top of page

Naive Bayes classifiers (Two Classes)

Approaches

Approach 1: How to use LabelEncoder

​

To graph the features and classes, they should be converted to a number. Use the LabelEncoder method to convert the labels into numbers. The LabelEncoder works separately for each feature so at the end, it is necessary to zip the features into a single list. Although it is possible to use the LabelEncoder for features, it is best practice to use the OrdinalEncoder which we will cover in Approach 2. More information is available in the SciKit Learn User Guide https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-categorical-features.

 

# Import LabelEncoder
from sklearn import preprocessing
# creating LabelEncoder
le = preprocessing.LabelEncoder()
 
# Converting string feature labels into numbers.
weather_encoded=le.fit_transform(weather)
temp_encoded=le.fit_transform(temp)
print("Weather:",weather_encoded)
print("Temp:",temp_encoded)
​
Weather: [2 2 0 1 1 1 0 2 2 1 2 0 0 1]
Temp: [1 1 1 2 0 0 0 2 0 2 2 2 1 2]

 

The class labels also need to be encoded.

​

# Converting string class labels into numbers
label=le.fit_transform(play)
print("Play:",label)
​
Play: [0 0 1 1 1 0 1 0 1 1 1 1 1 0]

​

The LabelEncoder method encodes features individually, so they need to be combined afterward. This issue does not happen when the OrdinalEncoder method is used we will take a look at the Ordinal Encoder later in this notebook.

​

# Combining weather and temp into single list of tuples
features=list(zip(weather_encoded,temp_encoded))
print(features)
 
[(2, 1), (2, 1), (0, 1), (1, 2), (1, 0), (1, 0), (0, 0), (2, 2), (2, 0), (1, 2), (2, 2), (0, 2), (0, 1), (1, 2)]

​

Next, train the model based on the dataset and then return the prediction based on a new value: overcast weather and mild temperatures.

 

TASK: Try changing the variables to predict if the match would take place if it was overcast and hot.

What about sunny and hot?

​

# Import Categorical Naive Bayes model
from sklearn.naive_bayes import CategoricalNB
 
# Create a Categorical Classifier
model = CategoricalNB()
 
# Train the model using the training sets
model.fit (features,label)
 
# Predict Output
predicted= model.predict([[0,2]]) # 0:Overcast, 2:Mild
 
# Predict probability
predict_probability = model.predict_proba([[0,2]])
print("Predicted Value:", le.inverse_transform(predicted), " with ", predict_probability)
​
Predicted Value: ['Yes'] with [[0.13043478 0.86956522]]

​

Approach 2: We should use OrdinalEncoder for features

​

When the dataset has more than one feature, it is best to first combine the features into a single list and then encode them using the OrdinalEncoder method. More information is available in the SciKit Learn User Guide https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-categorical-features.

​

# Get dataset with string features
training_set=list(zip(weather, temp))
print(training_set)
 
[('Sunny', 'Hot'), ('Sunny', 'Hot'), ('Overcast', 'Hot'), ('Rainy', 'Mild'), ('Rainy', 'Cool'), ('Rainy', 'Cool'), ('Overcast', 'Cool'), ('Sunny', 'Mild'), ('Sunny', 'Cool'), ('Rainy', 'Mild'), ('Sunny', 'Mild'), ('Overcast', 'Mild'), ('Overcast', 'Hot'), ('Rainy', 'Mild')]
​
# Create Ordinal Encoder
enc = preprocessing.OrdinalEncoder()
 
encoded_training_set = enc.fit_transform(training_set)
print(encoded_training_set)
​
[[2. 1.]
 [2. 1.]
 [0. 1.]
 [1. 2.]
 [1. 0.]
 [1. 0.]
 [0. 0.]
 [2. 2.]
 [2. 0.]
 [1. 2.]
 [2. 2.]
 [0. 2.]
 [0. 1.]
 [1. 2.]]
​
target_set = label

​

We can see that the trained model returns the same result.

​

# Train model again
model2 = CategoricalNB()
model2.fit(encoded_training_set, target_set)
 
# Predict Output
predicted2= model2.predict([[0,2]]) # 0:Overcast, 2:Mild
predict_probability2 = model.predict_proba([[0,2]])
 
print("Predicted Value:", le.inverse_transform(predicted2), " with ", predict_probability2)
​
Predicted Value: ['Yes'] with [[0.13043478 0.86956522]]

 

bottom of page